Transformer induced enhanced feature engineering for contextual similarity detection in text
نویسندگان
چکیده
Availability of large data storage systems has resulted in digitization information. Question and answering communities like Quora stack overflow take advantage such to provide information users. However, as the amount stored gets larger, it becomes difficult keep track existing information, especially duplication. This work presents a similarity detection technique that can be used identify levels textual based on context which was provided. transformer contextual (TCSD), uses combination bidirectional encoder representations from transformers (BERT) metrics derive features data. The derived are train ensemble model for detection. Experiments were performed using question set. Results comparisons indicate proposed exhibits with an accuracy 92.5%, representing high efficiency.
منابع مشابه
Contextual feature selection for text classification
We present a simple approach for the classification of ‘‘noisy’’ documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for call for tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are c...
متن کاملFeature Engineering for Text Classification
Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our hypothesis that they could improve the perfor...
متن کاملContextual Anomaly Detection in Text Data
We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-worl...
متن کاملSimilarity Guided Feature Labeling for Lesion Detection
The performance of automatic lesion detection is often affected by the intra- and inter-subject feature variations of lesions and normal anatomical structures. In this work, we propose a similarity-guided sparse representation method for image patch labeling, with three aspects of similarity information modeling, to reduce the chance that the best reconstruction of a feature vector does not pro...
متن کاملFeatures Based Text Similarity Detection
As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of Electrical Engineering and Informatics
سال: 2022
ISSN: ['2302-9285']
DOI: https://doi.org/10.11591/eei.v11i4.3284